Constructing parse forests that include exactly the n-best PCFG trees
نویسندگان
چکیده
This paper describes and compares two algorithms that take as input a shared PCFG parse forest and produce shared forests that contain exactly the n most likely trees of the initial forest. Such forests are suitable for subsequent processing, such as (some types of) reranking or LFG fstructure computation, that can be performed ontop of a shared forest, but that may have a high (e.g., exponential) complexity w.r.t. the number of trees contained in the forest. We evaluate the performances of both algorithms on real-scale NLP forests generated with a PCFG extracted from the Penn Treebank.
منابع مشابه
Estimating Probabilities for an Indonesian Stochastic Parser using the Inside Outside Algorithm
This paper presents work in constructing a Probabilistic Context Free Grammar (PCFG) parser for Indonesian. Due to the unavailability of a large manually parsed corpus, we start from an existing symbolic parser to parse a relatively small collection of Indonesian sentences. A PCFG language model is extracted, ignoring explicit linguistic information encoded in feature structures, and is subsequ...
متن کاملAn Effective Framework for Chinese Syntactic Parsing
This paper presents an effective framework for Chinese syntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsing algorithm, and integrates the idea of the beam search strategy of N best algorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation mode...
متن کاملGeneralized Queries on robabilistic Context-
Probabilistic context-free grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a context-free language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or finding the most likely parse) have been applied to a variety of pattern-recognition problems. We extend t...
متن کاملParsing with PCFGs and Automatic F-Structure Annotation
The development of large coverage, rich unification(constraint-) based grammar resources is very time consuming, expensive and requires lots of linguistic expertise. In this paper we report initial results on a new methodology that attempts to partially automate the development of substantial parts of large coverage, rich unification(constraint-) based grammar resources. The method is based on ...
متن کاملGeneralized Queries on Probabilistic Context-Free Grammars
Probabilistic context-free grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a context-free language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or finding the most likely parse) have been developed, and applied to a variety of patternrecognition problem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009